Detection of Interaction Effects in a Nonparametric Concurrent Regression Model

Many methods have been developed to study nonparametric function-on-function regression models. Nevertheless, there is a lack of model selection approach to the regression function as a functional function with functional covariate inputs. To study interaction effects among these functional covariates, in this article, we first construct a tensor product space of reproducing kernel Hilbert spaces and build an analysis of variance (ANOVA) decomposition of the tensor product space. We then use a model selection method with the L1 criterion to estimate the functional function with functional covariate inputs and detect interaction effects among the functional covariates. The proposed method is evaluated using simulations and stroke rehabilitation data.


Introduction
Functional data can be found in various fields, such as biology, economics, engineering, geology, medicine and psychology.Recently, statistical methods and theories on functional data were widely studied ( [1][2][3][4][5][6]). Functional data sometimes have more complicated structures.For example, the motivation data of this paper, stroke rehabilitation data, utilized a collection of 3D video games known as Circus Challenge to enhance upper limb function in stroke patients ( [7][8][9]).The patients were scheduled to play the movement game over three months at specified times.At each visit time t, the level of impairment of stroke subject i was measured using the CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as y i (t), and movements of upper limbs of patients, such as forward circle movement and sawing movement, were also recorded.The movement data at time t and frequency s from the ith patient were denoted as x i (t, s).Determining a way to model the relationship of y i (t) and functional x i (t, •) is key to studying whether the movements are helpful to the rehabilitation level of the stroke patient or not.Furthermore, there is a question of whether there are interaction effects among the movements on the stroke patient's rehabilitation.Zhai et al. [9] developed a nonparametric concurrent regression model to study the relationship between functional movements and the CAHAI score.However, they did not consider the interaction effects of functional movements on the CAHAI score.We aim to examine the interaction effects of movements on CAHAI scores and predict the rehabilitation level of stroke patients in this paper.
In this paper, we apply the following nonparametric concurrent regression model (NCRM) to model stroke rehabilitation data: where f is a bivariate functional to be estimated nonparametrically, response y i (t) is a function of t, covariate x i (t, •) is a vector of functional with length q, and i (t) is a random error.To explore the interaction effects among components of covariates x i (t, •), we use the smooth spline analysis of variance (SS ANOVA) method [10,11] to decompose regression function f .A multivariate function can be decomposed of main effects and interaction effects via the SS ANOVA method ( [10,11]).When the dimension of covariates q is large, the decomposed model contains a large number of interaction effects.Even if only the main effects and second-order interaction terms are investigated, the order of the number of decomposition terms is O(q 2 ), which leads to a highly complicated model.To model stroke rehabilitation data with q = 3, there are 22 terms, including the main effects and interaction effects.This challenges the estimation method for the NCRM model.To avoid this shortcoming, Zhai et al. [9] took all functional covariates as a whole and did not consider interaction effects among covariates.Following [12], this paper conducts a model selection method for the NCRM model with all main effects and interaction effects.In this method, the regression function is estimated and significant components of the decomposition are selected simultaneously.
Model selection is a crucial step in building statistical models that accurately capture the relationships between variables ( [13,14]).It can choose the most suitable model from a set of candidate models based on certain criteria such as goodness-of-fit, predictive performance, and interpretability.Based on the SS ANOVA approach, model selection is crucial to determine the contribution of each component of the decomposition to the overall variance of the response variable.Several methods have been proposed for selecting models with SS ANOVA, including forward selection, backward elimination, and stepwise regression ( [15][16][17][18][19][20]). However, these methods are limited in their ability to handle highdimensional data and identify complex interactions among variables.Hence, regularization methods such as the L 1 penalty have gained popularity in recent years ( [12,[21][22][23][24][25]), which allow for the selection of sparse and robust models.For example, Zhang et al. [23] developed a nonparametric penalized likelihood method with the likelihood basis pursuit and used it for variable selection and model construction.Lin and Zhang [22] proposed a component selection and smoothing method for multivariate nonparametric regression models by penalizing the sum of component norms of SS ANOVA.Furthermore, Wang et al. [12] developed a unified framework for estimation and model selection methods in nonparametric function-on-function regression, which performs well when using L 1 penalty methods for model selection.Dong and Wang [24] proposed a nonparametric method for learning conditional dependence structure in graph models by applying L 1 regularization to detect the neighborhoods of edges, where SS ANOVA decomposition is used to depict interaction effects of edges in the graph model.In this paper, we borrowed the L 1 regularization idea to build model selection by penalizing the sum of norms of the ANOVA decomposition components for the NCRM model.In addition, Bayesian analysis methods can also be used to study interaction effects; for example, Ren et al. [26] proposed a novel semiparametric Bayesian variable selection model for investigating linear and nonlinear gene-environment interactions simultaneously, allowing for structural identification.
This paper proposes an estimation and model selection approach for the NCRM model (1).Following [12,22], the SS ANOVA decomposition for the tensor product space of the reproducing kernel Hilbert spaces (RKHS) is constructed, and the L 1 penalty approach for the components of the decomposition is implemented.We use estimation procedures under either an L 1 or a joint L 1 and L 2 penalty to fit teh NCRM model.We study the interaction effects of the covariate x i (t, •) in model (1) via ANOVA decomposition of the regression function, where the tensor product RKHS is built based on Gaussian kernels.The decomposition is different from that of Zhai et al. [9], where they took the covariate as a whole variable and did not consider their interaction effects.Based on the decomposition, model selection with the tensor product RKHS is conducted using the L 1 penalty method.With regards to the covariate x i (t, •), the models from Wang et al. [12] are not suitable to analyze the stoke data.In this paper, we apply the proposed method to stroke rehabilitation data and study the relationship of the movements and the patient's CAHAI score.Besides the main effects, the interaction effect of the movements is also detected.
The remainder of the article is organized as follows.In Section 2, we present the tensor product RKHS with the Gaussian kernel and the SS ANOVA decomposition of the regression function.In Section 3, we show model selection and estimation procedures.The simulation study and application of stroke rehabilitation data are presented in Sections 4 and 5.We conclude in Section 6.

Nonparametric Concurrent Regression Model
For the NCRM model (1), we consider x i (t, •) = (x i1 (t, •), . . ., x iq (t, •)), where x ij (t, s) : S → R for any fixed time t ∈ T is a function of s within a space denoted by X j , j ∈ 1, • • • , q.Generally, t and s can be transformed into [0, 1].For simplicity, we let T = [0, 1] and S = [0, 1] and let X j ⊂ L 2 [0, 1], j = 1, • • • , q, which are independent of t.Furthermore, we assume that y i (t) ∈ Y ⊂ L 2 [0, 1] and i (t) for i = 1, . . ., n are identically and independently distributed in L 2 [0, 1] with mean zero and It is shown that the regression function f is a functional function with an independent covariate x i (t, •).To provide a nonparametric estimation of f , the SS ANOVA decomposition method is used to construct a tensor product space of RKHS to which f belongs.
When f is treated as a function with respect to the first augment t ∈ T , we consider the Sobolev space [10], where H (1) can be rewritten as where {1} is a constant space, {t} is a linear function space with t as an independent variable.H 2 is a smooth function space orthogonal to the constant space and the linear function space.Reproducing kernels (RK) for these three subspaces are K (1) where k 1 , k 2 and k 4 are defined as .
For functional augments x(t, •), RK and its corresponding RKHS for f as a function of functions in X = X 1 × • • • × X q are constructed as follows.For any u j , u j ∈ X j , we construct a Gaussian kernel as where u j 2 = 1 0 u 2 j (s)ds.We can show that when the space X j is a complete space, K (2) 2,j is a symmetric and strictly positive definite.The unique RKHS H (2) 2,j is separable and does not contain any non-zero constants.To construct an SS ANOVA decomposition, we let H (2) 2,j .Then, the tensor product space in this paper is q with the following decomposition: Decomposition ( 4) is different from that of Zhai et al. [9] where H (2) is decomposed of constant space {1} and another RKHS not considering interaction among x(t, •).
Next, we consider the tensor product space H = H (1) ⊗ H (2) which has the following decomposition: There, the null space {1} ⊕ {t} stands for the main effect of the parametric form of t, H 2 is the main effect of the non-parametric form of t, H 2,j is the main effect of the nonparametric form of u j , {t} ⊗ H (2) 2,j is the linear nonparametric interaction between t and u j , H 2,j is the nonparametric nonparametric interaction between t and u j , and so on, H (1) 2,q is the nonparametric nonparametric interaction between t and u, where u = (u 1 , . . ., u q ).We denote φ 1 (t, u) = 1 and φ 2 (t, u) = k 1 (t) as the basis functions of H 0 .For example, with q = 3, the RKs corresponding to the above sub-RKHS are 2,j (u j , u j ), 2,j (u j , u j ), 2,j (u j , u j ), for j, l = 1, 2, 3 and j < l, where the left and right parts stand for the tensor product spaces and their corresponding RKs, respectively.

Model Selection and Estimation
We let the projection of f onto H 0 be ∑ 2 k=1 d k φ k (t, u), u = (u 1 , . . ., u q ), and {H 1 , • • • , H Q } be the sub-RKHS generated by the tensor product method in Section 2, where Q is a number of sub-RKHS.L 1 penalties are applied to coefficients d k for the space H 0 and components of the decomposition of f (projections of f onto H j , j = 1, • • • , Q).We estimate f by minimizing the following penalized least squares: where f ∈ H, P v is the projection operator onto H j , • H is a norm induced from H, λ 1 and λ 2 are tuning parameters, and 0 ≤ w 1k , w 2,v < ∞ are pre-specified weights.We may set w 11 = 0 when φ 1 = 1 to avoid penalty to the constant function.
Since the response function is a stochastic process in the L 2 [0, 1] space, there exists a set of orthogonal basis functions {η k (t 27]).We let We assume that {L ik } are bounded linear functionals.With EFPC, functional data can be transformed to scalar data such that modeling and analysis can be conducted by using traditional statistical methods.It can show that the PLS (5) based on functional data y i (t) reduces to the following PLS based on scalar data {ν ik }: By Lemma 3.1 in Wang et al. [12], minimizing the PLS ( 6) is equivalent to minimizing the following PLS: subject to θ v ≥ 0 for 1 ≤ v ≤ Q, where λ 1 , τ 0 , τ 1 are tuning parameters.
We let To provide an RK with linear combination of its subspaces RK for the space of H * , we define a new inner product in H * , where < •, • > is the inner product in H.Under the new inner product, the RK of H * 1 is where coefficient θ v can measure the contribution of the components of the decomposition to the model.Next, we use the reproducing property of the kernel function to transform the infinite-dimensional optimization problem (7) into a finite-dimensional solution problem.

Statistical Properties
In this section, we assume that X and Y are complete measurable spaces.We let P be a probability measure on X q × L 2 (T ) and M = T × X q .Without the loss of generality, we let the terms 6) be combined into f H .We define a loss function, where y(t) ∈ Y and x ∈ X q .The corresponding L-risk function (Steinwart and Christmann [28]) is Obviously, f = f D,λ .We state convergence properties in the following theorem and show its proofs in Appendix B. Theorem 2. Assume that f : M → R is measurable for any f ∈ H, M is a complete measurable space, and |P| 2 = X q ×L 2 (T ) y(t) 2  2 dP(x, y) < ∞.When λ → 0 and λ 6 n → ∞ as n → ∞, we have Theorem 2 states that as λ tends to 0 and λ 6 n tends to infinity as n tends to infinity, the function estimate f is L-risk consistent (Steinwart and Christmann [28]).

Simulation
In this section, numerical experiments are studied to evaluate the performance of the proposed model selection approach.Functional covariate take , and x * ij (t, •) follows a Gaussian process with mean function µ(t) = t.Kernel function for the GP takes the RBF kernel k g (s 1 , s 2 ) = exp(−(s 1 − s 2 ) 2 /2) for j = 1 and the rational quadratic kernel k l (s 1 , Three functions for f (t, x(t, •)) are presented as follows: for t ∈ [0,1], x 1 (t, s)x 2 (t, s)ds.
We see that M 1 has the main effect of t, M 2 consists of three main effects and the nonparametric nonparametric interaction of x 1 and x 2 , M 3 consists of the main effect of t, and the nonparametric nonparametric interaction of x 1 and x 2 .Random error i (t) follows N(0, 0.2 2 ) and N(0, 0.5 2 ).All simulations are repeated 200 times.We generate n samples {y i (t), x i (t, •) : i = 1, . . ., n} as training data, and n t = 50 samples { y i (t), x i (t, •) : i = 1, . . ., n} as test data.For comparison, we evaluate the performance using the following root mean square error (RMSE) on the test data: , where • 2 is the norm of L 2 (T ).
The proposed model selection method is used to train the model and predict the test data, denoted by L 1 .Not considering model selection, we use the L 2 penalty method to estimate the NCRM model denoted by L 2 , which minimizes the following objective function, where λ is the tuning parameter.After model selection, the selected model is estimated with the L 2 penalty method, which is denoted by L 1 + L 2 .Table 1 shows the average RMSE and standard deviation in parentheses for these three estimation methods, L 1 , L 2 and L 1 + L 2 .We see that under models M1 and M3, L 1 + L 2 have the smallest RMSEs among these three estimation methods.Under model M2, L 1 has better performance than L 2 and comparable results with those of L 1 + L 2 .In addition, for the three different methods, prediction performance improves as the σ decreases and the training sample size increases.
To evaluate the performance of model selection by the L 1 penalty method, we take three measurement indices in Wang et al. [12], specificity (SPE), sensitivity (SEN) and where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives, respectively.The non-zero components of the decomposition of the regression function are considered as positive samples.For θ v > 0, its estimated value is larger than 0, which is considered a true positive.Table 2 shows the sensitivities, specificities, and F1 scores.Overall, the L 1 penalty method performs well in different simulation settings.In addition, model selection becomes better with decreasing σ and increasing training sample size.

Application
The proposed model selection approach is applied to analyze stroke rehabilitation data with 70 stroke survivors ( [7]).
The data consist of 34 acute patients with an incidence of stroke less than a month ago and 36 chronic patients with an incidence of stroke more than six months ago.To improve upper limb functions for stroke patients, a convenient home-based rehabilitation system via action video games with 3D-position movement behaviors has been developed [7,8].The patients played the movement game at scheduled times.For each visit time t, the impairment level of subject i was assessed using a measure called CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as y i (t), and movements such as forward circular movement and sawing movement were recorded.In this paper, three movements, forward circular movement of the parental limb from the x-axis (x i1 = LA05.lx),sawing movement of the parental limb from the y-axis (x i2 = LA09.ly),and the movement of the non-parental limb from the direction of the x-axis (x i3 = LA28.rqx)are taken as functional covariates.For the purpose of illustrating the proposed method, we use the data from acute patients.During the three-month study period, each acute patient received up to seven assessments, which resulted in 173 observations.CAHAI scores were normalized before analysis.
In this paper, we focus on the interaction effect upon the order of two, and take the following decomposition: 2,l (u l , u l ), for j, l = 1, 2, 3 and j < l.Readers can also choose various kinds of SS ANOVA decomposition by merging kernel functions according to their own needs.From Section 3, we have where coefficient θ v for kernel function K v provides levels of contribution of K v to the overall model.The penalty method with L 1 regularization for model selection is applied to stroke rehabilitation data.Parameters {θ v } are computed, and values larger than 0 are θ 2 = 4.157, θ 3 = 0.819, θ 4 = 0.636, θ 7 = 0.592 and θ 10 = 0.741.This shows that the main effects of x i1 (t, •), x i2 (t, •) and x i3 (t, •), the linear nonparametric interaction of t and x 3 (t, •) and the nonparametric-nonparametric interaction of x 2 (t, •) and x 3 (t, •) have nonzero contributions to the CAHAI score.Thus, the three movements, forward circular movements of the parental limb, awing movements of the parental limb and of the non-parental limb, may be helpful to the recovery of stroke patients.In addition, the interaction of awing movements of the parental limb and the non-parental limb, may contribute to the level of daily life dependence or upper limb function impairment.Figure 1 plots the estimates of nonparametric regression functions for four stroke patients.We can see that the regression function in the NCRM model has the same trend as the scores of CAHAI, and on the whole, they all show an increasing trend along with fluctuating trends, which shows that movements may improve upper limb function for stroke patients.
Prediction performance of the proposed method is evaluated using a tenfold cross-validation, , where Ŷ(−j) i (t) is the predicted value of Y i (t) based on the fitted selected model with the L 1 + L 2 penalty and the data excluding the jth fold.The RPE for Stoke data is 1.0690, which is smaller than 1.1700 from the method of Zhai et al. [9].

Conclusions
For functional data with functional covariate inputs, this paper applies the Gaussian kernel function to construct the tensor product RKHS to model the regression function.This leads to a nonparametric concurrent regression model.The L 1 penalty method is used to detect components of the SS ANOVA decomposition of the regression function, which has nonzero contribution to model fit.The backfitting algorithm is developed to estimate the model selection.The proposed method is applied to stroke rehabilitation data, and the results show that besides the main effects, there are interaction effects of the movements on the CAHAI score.This indicates that movements may help improve the level of daily life dependence or impairment of upper limb function of a stroke patient.
Since |P| 2 and K H H are bounded, without loss of generality, we assume that q = 1, |P| 2 = 1, and K H = 1, where K H is the RK of H and then R L,P (0) ≤ |P| 2 = 1.From the proof of Theorem 3 in Zhai et al. [9], we have where c is an indeterminate constant depending on L. We know that Hence, when f P,λ − f H ≤ where Φ((t, x(t, •))) = K H (•, (t, x(t, •))) is a canonical map.We let h(x, y) = −2(y − f P,λ (x)).Following the proof of Theorem 5.9 in [28], we show that f P,λ − f P,λ , E P hΦ − E P hΦ + λ f P,λ − f P,λ 2 H ≤ 0, where P is any distribution defined on X q × L 2 (T ).

Algorithm 1
Model Selection Algorithm Set initial value d = d 0 , θ = θ 0 .repeat Update c by minimizing 1 n Y − Td − Σc 2 + τ 0 c Σc Calculate Y * = Y − Rθ, where R is an n × Q matrix with the v-th column being w −1 2,v Σ v c Update d by minimizing 1 n Y * − Td 2 + λ 1 ∑ 2 k=1 w 1k |d k | select tuning parameter M by the k-fold cross-validation or BIC method Update θ by minimizing 1 n Y − Td − Rθ 2 + τ 0 c Rθ subject to θ v ≥ 0 for 1 ≤ v ≤ Q and w 2 θ ≤ M until c, d and θ converge return c, d and θ

Figure 1 .
Figure 1.CAHAI scores (red line) and their corresponding fitted values (blue line) for 4 patients.

Table 1 .
Average values and standard deviations of RMSEs.(Methods corresponding to bold numbers perform best).